Determining themes

ABSTRACT

Determining themes is disclosed. Reputation data extracted from at least one data source is received. The reputation data includes a plurality of user-authored reviews. The presence of a first keyword is detected in a first review. The presence of a second keyword that is different from but associated with the first keyword is detected in a second review. A sentiment for a theme is determined based on the detected presence of the first and second keywords. A report that indicates the sentiment for the theme is provided as output.

CROSS REFERENCE TO OTHER APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/842,376, entitled DETERMINING THEMES filed Mar. 15, 2013 which isincorporated herein by reference for all purposes, which claims priorityto U.S. Provisional Patent Application No. 61/666,586 entitled BUSINESSREPUTATION SYSTEM filed Jun. 29, 2012 and to U.S. Provisional PatentApplication No. 61/747,340 entitled REVIEW REQUEST AUTOMATION filed Dec.30, 2012, both of which are incorporated herein by reference for allpurposes.

BACKGROUND OF THE INVENTION

Businesses are increasingly concerned with their online reputations, andthe reputations of their competitors. For example, both positive andnegative reviews posted to a review website can impact revenue. As morereview websites are created, and as more users post more content tothose sites, it is becoming increasingly difficult for businesses tomonitor online information.

BRIEF DESCRIPTION OF THE DRAWINGS

Various embodiments of the invention are disclosed in the followingdetailed description and the accompanying drawings.

FIG. 1 illustrates an embodiment of an environment in which businessreputation information is collected, analyzed, and presented.

FIG. 2 illustrates an example of components included in embodiments of areputation platform.

FIG. 3 illustrates an embodiment of a process for enrolling a businesswith a reputation platform.

FIG. 4 illustrates an example of components included in embodiments of areputation platform.

FIG. 5 illustrates an embodiment of a process for refreshing reputationdata.

FIG. 6 illustrates an example of an interface as rendered in a browser.

FIG. 7 illustrates an example of components included in an embodiment ofa reputation platform.

FIG. 8 illustrates an embodiment of a process for generating areputation score.

FIG. 9 illustrates an example of an interface as rendered in a browser.

FIG. 10 illustrates an example of an interface as rendered in a browser.

FIG. 11 illustrates an example of an interface as rendered in a browser.

FIG. 12 illustrates a portion of an interface as rendered in a browser.

FIG. 13 illustrates a portion of an interface as rendered in a browser.

FIG. 14 illustrates an example of an interface as rendered in a browser.

FIG. 15 illustrates a portion of an interface as rendered in a browser.

FIG. 16 illustrates a portion of an interface as rendered in a browser.

FIG. 17 illustrates an example of an interface as rendered in a browser.

FIG. 18 illustrates a portion of an interface as rendered in a browser.

FIG. 19 illustrates a portion of an interface as rendered in a browser.

FIG. 20 illustrates an embodiment of a reputation platform that includesa review request engine.

FIG. 21 illustrates an embodiment of a process for targeting reviewplacement.

FIG. 22 illustrates an example of a target distribution.

FIG. 23 illustrates an example of a target distribution.

FIG. 24 illustrates an embodiment of a process for performing anindustry review benchmark.

FIG. 25 illustrates an embodiment of a process for recommendingpotential reviewers.

FIG. 26 illustrates an embodiment of a process for determining afollow-up action.

FIG. 27 illustrates a portion of an interface as rendered in a browser.

FIG. 28 illustrates an embodiment of a process for stimulating reviews.

FIG. 29 illustrates an example of an interface as rendered in a browser.

FIG. 30 illustrates an example of an interface as rendered in a browser.

FIG. 31 illustrates an example of an interface as rendered in a browser.

FIG. 32 illustrates an example of a popup display of reviews including aterm.

FIG. 33 illustrates an alternate example of a popup display of reviewsincluding a term.

FIG. 34 illustrates an example of an interface as rendered in a browser.

FIG. 35 illustrates an example of an interface as rendered in a browser.

FIG. 36 illustrates an embodiment of a process for assigning sentimentto themes.

FIG. 37A illustrates an embodiment of an ontology associated withmedical practices.

FIG. 37B illustrates an embodiment of an ontology associated with arestaurant.

FIG. 38 illustrates an example of sentiment being assigned to themesbased on three reviews.

FIG. 39 illustrates an example of a process for assigning a sentiment toa theme.

FIG. 40 is a table of example positivity calculations.

FIG. 41A is a portion of a table of themes and scores for an examplerestaurant.

FIG. 41B is a portion of a table of themes and scores for an examplerestaurant.

FIG. 41C is a portion of a table of themes and scores for an examplerestaurant.

FIG. 42 illustrates an example of a sentence included in a review.

FIG. 43 illustrates an example of a sentence included in a review.

FIG. 44 illustrates an example of a sentence included in a review.

FIG. 45 illustrates an example of sentence extractions used indeduplication.

DETAILED DESCRIPTION

The invention can be implemented in numerous ways, including as aprocess; an apparatus; a system; a composition of matter; a computerprogram product embodied on a computer readable storage medium; and/or aprocessor, such as a processor configured to execute instructions storedon and/or provided by a memory coupled to the processor. In thisspecification, these implementations, or any other form that theinvention may take, may be referred to as techniques. In general, theorder of the steps of disclosed processes may be altered within thescope of the invention. Unless stated otherwise, a component such as aprocessor or a memory described as being configured to perform a taskmay be implemented as a general component that is temporarily configuredto perform the task at a given time or a specific component that ismanufactured to perform the task. As used herein, the term ‘processor’refers to one or more devices, circuits, and/or processing coresconfigured to process data, such as computer program instructions.

A detailed description of one or more embodiments of the invention isprovided below along with accompanying figures that illustrate theprinciples of the invention. The invention is described in connectionwith such embodiments, but the invention is not limited to anyembodiment. The scope of the invention is limited only by the claims andthe invention encompasses numerous alternatives, modifications andequivalents. Numerous specific details are set forth in the followingdescription in order to provide a thorough understanding of theinvention. These details are provided for the purpose of example and theinvention may be practiced according to the claims without some or allof these specific details. For the purpose of clarity, technicalmaterial that is known in the technical fields related to the inventionhas not been described in detail so that the invention is notunnecessarily obscured.

FIG. 1 illustrates an embodiment of an environment in which businessreputation information is collected, analyzed, and presented. In theexample shown, the user of client device 106 (hereinafter referred to as“Bob”) owns a single location juice bar (“Bob's Juice Company”). Theuser of client device 108 (hereinafter referred to as “Alice”) isemployed by a national chain of convenience stores (“ACME ConvenienceStores”). As will be described in more detail below, Bob and Alice caneach access the services of reputation platform 102 (via network 104) totrack the reputations of their respective businesses online. Thetechniques described herein can work with a variety of client devices106-108 including, but not limited to personal computers, tabletcomputers, and smartphones.

Reputation platform 102 is configured to collect reputation and otherdata from a variety of sources, including review websites 110-114,social networking websites 120-122, and other websites 132-134. In someembodiments, users of platform 102, such as Alice and Bob, can alsoprovide offline survey data to platform 102. In the examples describedherein, review site 110 is a general purpose review site that allowsusers to post reviews regarding all types of businesses. Examples ofsuch review sites include Google Places, Yahoo! Local, and Citysearch.Review site 112 is a travel-oriented review site that allows users topost reviews of hotels, restaurants, and attractions. One example of atravel-oriented review site is TripAdvisor. Review site 114 is specificto a particular type of business (e.g., car dealers). Examples of socialnetworking sites 120 and 122 include Twitter and Foursquare. Socialnetworking sites 120-122 allow users to take actions such as “checkingin” to locations. Finally, personal blog 134 and online forum 132 areexamples of other types of websites “on the open Web” that can containbusiness reputation information.

Platform 102 is illustrated as a single logical device in FIG. 1. Invarious embodiments, platform 102 is a scalable, elastic architectureand may comprise several distributed components, including componentsprovided by one or more third parties. Further, when platform 102 isreferred to as performing a task, such as storing data or processingdata, it is to be understood that a sub-component or multiplesub-components of platform 102 (whether individually or in cooperationwith third party components) may cooperate to perform that task.

Account/Business Setup

FIG. 2 illustrates an example of components included in embodiments of areputation platform. In particular, FIG. 2 illustrates components ofplatform 102 that are used in conjunction with a business setup process.

In order to access the services provided by reputation platform 102, Bobfirst registers for an account with the platform. At the outset of theprocess, he accesses interface 202 (e.g., a web-based interface) andprovides information such as a desired username and password. He alsoprovides payment information (if applicable). If Bob has createdaccounts for his business on social networking sites such as sites 120and 122, Bob can identify those accounts to platform 102 as well.

Next, Bob is prompted by platform 102 to provide the name of hisbusiness (e.g., “Bob's Juice Company”), a physical address of the juicebar (e.g., “123 N. Main St.; Cupertino, Calif. 95014), and the type ofbusiness that he owns (e.g., “restaurant” or “juice bar”). The businessinformation entered by Bob is provided to auto find engine 204, which isconfigured to locate, across sites 110-114, the respective profiles onthose sites pertaining to Bob's business (e.g.,“www.examplereviewsite.com/CA/Cupertino/BobsJuiceCo.html”), if present.Since Bob has indicated that his business is a juice bar, reputationplatform 102 will not attempt to locate it on site 114 (a car dealerreview site), but will attempt to locate it within sites 110 and 112.

In the example shown in FIG. 2, sites 110 and 114 make availablerespective application programming interfaces (APIs) 206 and 208 thatare usable by auto find engine 204 to locate business profiles on theirsites. Site 112 does not have a profile finder API. In order to locate abusiness profile there, auto find engine 204 is configured to perform asite-specific search using a script that accesses a search engine (e.g.,through search interface 210). As one example, a query of:“site:www.examplereviewsite.com ‘Bob's Juice Company’ Cupertino” couldbe submitted to the Google search engine using interface 210.

Results obtained by auto find engine 204 are provided to verificationengine 212, which confirms that information, such as the physicaladdress and company name provided by Bob are present in the locatedprofiles. Verification engine 212 can be configured to verify allresults (including any obtained from site 110 and 114), and can also beconfigured to verify (or otherwise process) just those results obtainedvia interface 210. As one example, for a given query, the first tenresults obtained from search interface 210 can be examined. The resultthat has the best match score and also includes the expected businessname and physical address is designated as the business's profile at thequeried site.

In some embodiments, verification engine 212 presents results to Bob forverification that the located profiles correspond to his business. Asone example, Bob may be shown (via interface 202) a set of URLscorresponding to profiles on each of the sites 110-114 where hisbusiness has been located and asked to verify that the profiles areindeed for his business. Once confirmed by Bob, the URLs of the profiles(also referred to herein as “subscriptions”) and any other appropriatedata are stored in database 214. Examples of such other data includeoverview information appearing on the business's profile page (such as adescription of the business) and any social data (e.g., obtained fromsites 120-122).

In various embodiments, users are given the option by platform 102 toenter the specific URLs corresponding to their business profiles onreview sites. For example, if Bob knows the URL of the Google Placespage corresponding to his business, he can provide it to platform 102and use of auto find engine 204 is omitted (or reduced) as applicable.

FIG. 3 illustrates an embodiment of a process for enrolling a businesswith a reputation platform. In some embodiments process 300 is performedby platform 102. The process begins at 302 when a physical address of abusiness is received. As one example, when Bob provides the address ofhis business to platform 102 via interface 202, that address is receivedat 302. At 304, the received address is used as a query. As one exampleof the processing performed at 304, the received address is provided tosite 110 using API 206. As another example, a site-specific query (e.g.,of site 112) is submitted to a search engine via search interface 210.

At 306, results of the query (or queries) performed at 304 are verified.As one example of the processing performed at 304, verification engine212 performs checks such as confirming that the physical addressreceived at 302 is present in a given result. As another example, a usercan be asked to confirm that results are correct, and if so, thatconfirmation is received as a verification at 306. Finally, at 308,verified results are stored. As one example, URLs for each of theverified profiles is stored in database 214. Although pictured as asingle database in FIG. 2, in various embodiments, platform 102 makesuse of multiple storage modules, such as multiple databases. Suchstorage modules may be of different types. For example, user account andpayment information may be stored in a MySQL database, while extractedreputation information (described in more detail below) may be storedusing MongoDB.

Where a business has multiple locations, the business owner (or arepresentative of the business, such as Alice) can be prompted to loopthrough process 300 for each of the business locations. Physicaladdresses and/or the URLs of the corresponding profiles on sites such assites 110-114 can also be provided to platform 102 in a batch, ratherthan by manually entering in information via interface 202. As oneexample, suppose ACME Convenience Stores has 2,000 locations throughoutthe United States. Instead of manually entering in the physical locationof each of the stores, Alice may instead elect to upload to platform 102a spreadsheet or other file (or set of files) that includes theapplicable information.

Tags associated with each location can also be provided to platform 102(e.g., as name-value pairs). For example, Alice can tag each of the2,000 locations with a respective store name (Store #1234), manager name(Tom Smith), region designation (West Coast), brand (ACME-Quick vs.Super-ACME), etc. As needed, tags can be edited and deleted, and newtags can be added. For example, Alice can manually edit a givenlocation's tags (e.g., via interface 202) and can also upload aspreadsheet of current tags for all locations that supersede whatevertags are already present for her locations in platform 102. As will bedescribed in more detail below, the tags can be used to segment thebusiness to create custom reports and for other purposes.

Ongoing Data Collection and Processing

Once a business (e.g., Bob's Juice Company) has an account on reputationplatform 102, and once the various subscriptions (i.e., the URLs of thebusiness's profiles on the various review sites) have been identifiedand stored in database 214, collecting and processing of review andother data is performed. FIG. 4 illustrates an example of componentsincluded in embodiments of a reputation platform. In particular, FIG. 4illustrates components of platform 102 that are used in conjunction withthe ongoing collection and processing of data.

Reputation platform 102 includes a scheduler 402 that periodicallyinstructs collection engine 404 to obtain data from sources such assites 110-114. In some embodiments, data from sites 120-122, and/or132-134 is also collected by collection engine 404. Scheduler 402 can beconfigured to initiate data collection based on a variety of rules. Forexample, it can cause data collection to occur once a day for allbusinesses across all applicable sites. It can also cause collection tooccur with greater frequency for certain businesses (e.g., which pay forpremium services) than others (e.g., which have free accounts). Further,collection can be performed across all sites (e.g., sites 110-114) withthe same frequency or can be performed at different intervals (e.g.,with collection performed on site 110 once per day and collectionperformed on site 112 once per week).

In addition to or instead of the scheduled collection of data, datacollection can also be initiated based on the occurrence of an arbitrarytriggering event. For example, collection can be triggered based on alogin event by a user such as Bob (e.g., based on a permanent cookie orpassword being supplied). Collection can also be triggered based on anon-demand refresh request by the user (e.g., where Bob clicks on a“refresh my data” button in interface 202). Other elements depicted inFIG. 4 will be described in conjunction with process 500 shown in FIG.5.

FIG. 5 illustrates an embodiment of a process for refreshing reputationdata. In some embodiments process 500 is performed by platform 102. Theprocess begins at 502 when a determination is made that a data refreshshould be performed. As one example, such a determination is made at 502by scheduler 402 based on an applicable schedule. As another example,such a determination is made at 502 when a triggering event (such as alogin event by Bob) is received by platform 102.

At 504, a determination is made as to which sites should be accessed. Asone example, in some embodiments collection engine 404 reviews the setof subscriptions stored in database 214 for Bob's Juice Company. The setof subscriptions associated with Bob's company are the ones that will beused by collection engine 404 during the refresh operation. Aspreviously mentioned, a refresh can be performed on behalf of multiple(or all) businesses, instead of an individual one such as Bob's JuiceCompany. In such a scenario, portion 504 of the process can be omittedas applicable.

At 506, information is obtained from the sites determined at 504. Asshown in FIG. 4, collection engine 404 makes use of several differenttypes of helpers 420-428. Each helper (e.g., helper 420) is configuredwith instructions to fetch data from a particular type of source. As oneexample, although site 110 provides an API for locating businessprofiles, it does not make review data available via an API. Such datais instead scraped by platform 102 accordingly. In particular, when adetermination is made that reviews associated with Bob's Juice Companyon site 110 should be refreshed by platform 102, an instance 430 ofhelper 420 is executed on platform 102. Instance 430 is able to extract,for a given entry on site 110, various components such as: thereviewer's name, profile picture, review title, review text, and rating.Helper 424 is configured with instructions for scraping reviews fromsite 114. It is similarly able to extract the various components of anentry as posted to site 114. Site 112 has made available an API forobtaining review information and helper 422 is configured to use thatAPI.

Other types of helpers can extract other types of data. As one example,helper 426 is configured to extract check-in data from social site 120using an API provided by site 120. As yet another example, when aninstance of helper 428 is executed on platform 102, a search isperformed across the World Wide Web for blog, forum, or other pages thatdiscuss Bob's Juice Company. In some embodiments, additional processingis performed on any results of such a search, such as sentimentanalysis.

In various embodiments, information, obtained on behalf of a givenbusiness, is retrieved from different types of sites in accordance withdifferent schedules. For example, while review site data might becollected hourly, or on demand, social data (collected from sites120-122) may be collected once a day. Data may be collected from siteson the open Web (e.g., editorials, blogs, forums, and/or other sites notclassified as review sites or social sites) once a week.

At 508, any new results (i.e., those not already present in database214) are stored in database 214. As needed, the results are processed(e.g., by converting reviews into a single, canonical format) prior tobeing included in database 214. In various embodiments, database 214supports heterogeneous records and such processing is omitted ormodified as applicable. For example, suppose reviews posted to site 110must include a score on a scale from one to ten, while reviews posted tosite 112 must include a score on a scale from one to five. Database 214can be configured to store both types of reviews. In some embodiments,the raw score of a review is stored in database 214, as is a convertedscore (e.g., in which all scores are converted to a scale of one toten). As previously mentioned, in some embodiments, database 214 isimplemented using MongoDB, which supports such heterogeneous recordformats. As will be described in more detail below, in some embodiments,platform 102 includes a theme engine 434, which is configured toidentify themes common across reviews.

Prior to the first time process 500 is executed with respect to Bob'sJuice Company, no review data is present in database 214. Portion 506 ofthe process is performed for each of the data sources applicable toBob's business (via instances of the applicable helpers), and thecollected data is stored at 508. On subsequent refreshes of datapertinent to Bob's company, only new/changed information is added todatabase 214. In various embodiments, alerter 432 is configured to alertBob (e.g., via an email message) whenever process 500 (or a particularportion thereof) is performed with respect to his business. In somecases, alerts are only sent when new information is observed, and/orwhen reputation scores associated with Bob's business (described in moredetail below) change, or change by more than a threshold amount.

Reputation Scoring

Platform 102 is configured to determine a variety of reputation scoreson behalf of businesses such as Bob's Juice Company. In the case ofmultiple-location businesses, such as ACME, individual reputation scoresare determined for each of the locations, and the scores of individualbusinesses can be aggregated in a variety of ways. As will be describedin more detail below, the scores provide users with perspective on howtheir businesses are perceived online. Also as will be described in moredetail below, users are able to explore the factors that contribute totheir businesses' reputation scores by manipulating various interfacecontrols, and they can also learn how to improve their scores. In thecase of multi-location businesses, such as ACME, users can segment thelocations in a variety of ways to gain additional insight.

FIG. 6 illustrates an example of an interface as rendered in a browser.In particular, Bob is presented with interface 600 after logging in tohis account on platform 102 using a browser application on client device106 and clicking on tab option 602.

In region 604 of interface 600, a composite reputation score (728points) is depicted on a scale 606. Example ways of computing acomposite score are described in conjunction with FIG. 7. The compositereputation score provides Bob with a quick perspective on how Bob'sJuice Company is perceived online. A variety of factors can beconsidered in determining a composite score. Six example factors areshown in region 608, each of which is discussed below. For each factor,Bob can see tips on how to improve his score with respect to that factorby clicking on the appropriate box (e.g., box 622 for tips on improvingscore 610). In the example shown in FIG. 6, a recommendation box ispresent for each score presented in region 608. In some embodiments,such boxes are only displayed for scores that can/should be improved.For example, given that score 614 is already very high, in someembodiments, box 626 is omitted from the interface as displayed to Bob,or an alternate message is displayed, such as a general encouragement to“keep up the good work.”

Overall Score (610): This value reflects the average review score (e.g.,star rating) across all reviews on all review sites. As shown, Bob'sbusiness has an average rating of 0.50 across all sites. If Bob clickson box 622, he will be presented with a suggestion, such as thefollowing: “Overall score is the most influential metric. It can appearin both the review site search results and in your general search engineresults. Generating a larger volume of positive reviews is the best wayto improve the overall score. Typically, volume is the best approach asyour average, happy customer will not write a review without beingasked.” Additional, personalized advice may also be provided, such astelling Bob he should click on tab 634 and request five reviews.

Timeliness (612): This score indicates how current a business's reviewsare (irrespective of whether they are positive or negative). In theexample shown, reviews older than two months have less of an impact thanmore recent reviews. Thus, if one entity has 200 reviews with an averagerating of four stars, at least some of which were recently authored, anda second entity has the same volume and star rating but none of thereviews were written in the last two months, the first entity will havea higher timeliness score and thus a higher composite reputation score.If Bob clicks on box 624, he will be presented with a suggestion, suchas the following: “Managing your online reviews is not a one-timeexercise, but a continual investment into your business. Encourage asteady trickle of new reviews on a regular basis to ensure that yourreviews don't become stale.” Other measures of Timeliness can also beused, such as a score that indicates the relative amount of new vs. oldpositive reviews and new vs. old negative reviews. (I.e., to see whetherpositive or negative reviews dominate in time.)

Length (614): This score indicates the average length of a business'sreviews. Longer reviews add weight to the review's rating. If tworeviews have the same star rating (e.g., one out of five stars), but thefirst review is ten words and the second review is 300 words, the secondreview will be weighted more when computing the composite score. If Bobclicks on box 626, he will be presented with a suggestion, such as thefollowing: “Encourage your positive reviewers to write in-depth reviews.They should detail their experiences and highlight what they like aboutyour business. This provides credibility and the guidance makes reviewwriting easier for them.” Other measures of Length can also be used,such as a score that indicates the relative amount of long vs. shortpositive reviews and long vs. short negative reviews. (I.e., to seewhether positive or negative reviews dominate in length.)

Social Factors (616): Reviews that have been marked with socialindicators (e.g., they have been marked by other members of the reviewcommunity as being “helpful” or “funny”) will have more bearing on theoutcome of the composite score. By clicking on box 632, Bob will bepresented with an appropriate suggestion for improvement.

Reviewer Authority (618): A review written by an established member of acommunity (e.g., who has authored numerous reviews) will have a greaterimpact on the outcome of the composite score than one written by areviewer with little or no history on a particular review site. In someembodiments, the audience of the reviewer is also taken intoconsideration. For example, if the reviewer has a large Twitterfollowing, his or her review will have a greater bearing on the outcomeof the score. If Bob clicks on box 628, he will be presented with asuggestion, such as the following: “Established reviewers can be a majorboon to your review page. Their reviews are rarely questioned and theiropinions carry significant weight. If you know that one of yourcustomers is an active reviewer on a review site, make a special effortto get him or her to review your business.”

Industry (620): Review sites that are directly related to the verticalin which the entity being reviewed resides are given more weight. Forexample, if the entity being reviewed is a car dealership and the reviewsite caters specifically to reviews about car dealerships, the reviewsin that specific site will have a greater impact on the outcome of thecomposite score than those on vertically ambiguous websites. If Bobclicks on box 630, he will be presented with a suggestion, such as thefollowing: “The most important review sites for your business shouldhave your best reviews. Monitor your website analytics to find the siteshaving the biggest impact on your business, and reinforce your presenceon those sites.”

In various embodiments of interface 600, additional controls forinteractions are made available. For example, a control can be providedthat allows a user to see individual outlier reviews—reviews thatcontributed the most to/deviated the most from the overall score (and/orindividual factors). As one example, a one-star review that is weightedheavily in the calculation of a score or scores can be surfaced to theuser. The user could then attempt to resolve the negative feelings ofthe individual that wrote the one-star review by contacting theindividual. As another example, a particularly important five-starreview (e.g., due to being written by a person with a very high reviewerauthority score) can be surfaced to the user, allowing the user tocontact the reviewer and thank him or her. As yet another example, if anotherwise influential review is stale (and positive), the review can besurfaced to the user so that the user can ask the author to provide anupdate or otherwise refresh the review.

A variety of weights can be assigned to the above factors whengenerating the composite score shown in region 604. Further, the factorsdescribed above need not all be employed nor need they be employed inthe manners described herein. Additional factors can also be used whengenerating a composite score. An example computation of a compositescore is discussed in conjunction with FIG. 7.

Example Score Generation

FIG. 7 illustrates an example of components included in an embodiment ofa reputation platform. In particular, FIG. 7 illustrates components ofplatform 102 that are used in conjunction with generating reputationscores.

In some embodiments, whenever Bob accesses platform 102 (and/or based onthe elapsing of a certain amount of time), the composite score shown at604 in FIG. 6 is refreshed. In particular, scoring engine 702 retrieves,from database 214, review and other data pertaining to Bob's businessand generates the various scores shown in FIG. 6. Example ways ofcomputing a composite reputation score are as follows.

(1) Base Score

First, scoring engine 702 computes a base score “B” that is a weightedaverage of all of the star ratings of all of the individual reviews onall of the sites deemed relevant to Bob's business:

$B = {{100 \cdot \frac{\Sigma_{i}^{N_{r}}s_{i}w_{i}}{\Sigma_{i}^{N_{r}}w_{i}} \cdot \Theta}\mspace{14mu}( {N_{r} - N_{\min}} )}$

where “N_(r)” is the total number of reviews, “s_(i)” is the number of“stars” for review “i” normalized to 10, “w_(i)” is the weight forreview “i,” Θ is the Heaviside step function, and “N_(min)” is theminimum number of reviews needed to score (e.g., 4). The factor 100 isused to expand the score to a value from 0 to 1000.

One example of the function “w_(i)” is as follows:

w _(i) =D _(A) ·T _(i) ·P _(i) ·R _(A) ·S _(F) ·L _(F)

In the above, “D_(A)” is the domain authority, which reflects howimportant the domain is with respect to the business. As one example, adoctor-focused review site may be a better authority for reviews ofdoctors than a general purpose review site. One way to determine domainauthority values is to use the domain's search engine results pageplacement using the business name as the keyword.

“R_(A)” is the reviewer authority. One way to determine reviewerauthority is to take the logarithm of 1+the number of reviews written bythe reviewer. As explained above, a review written by an individual whohas authored many reviews is weighted more than one written by a lessprolific user.

“S_(F)” is the social feedback factor. One way to determine the factoris to use the logarithm of 1+the number of pieces of social feedback areview has received.

“L_(F)” is the length factor. One way to specify this value is to use 1for short reviews, 2 for medium reviews, and 4 for long reviews.

“T_(i)” is the age factor. One way to specify this factor is through thefollowing: If the age is less than two months T_(i)=1, if the age“a_(i)” (in months)>2 months, then the following value is used:

T _(i)=max(e ^(−ω·(α) ^(i) ⁻²⁾, 0.5)

where ω is the time-based decay rate.

“P_(i)” is the position factor for review “i.” The position factorindicates where a given review is positioned among other reviews of thebusiness (e.g., it is at the top on the first page of results, or it ison the tenth page). One way to compute the position factor is asfollows:

$P_{i} = e^{- \frac{p_{i}}{\lambda}}$

where λ is the positional decay length.

In some cases, a given site (e.g., site 110) may have an overall ratinggiven for the business on the main profile page for that business on thesite. In some embodiments, the provided overall rating is treated as anadditional review with age a=a₀ and position p=p₀ and given anadditional weight factor of 2.

(2) Normalization

Once the base score has been computed, it is normalized (to generate“B_(norm)”). In some embodiments this is performed by linearlystretching out the range of scores from 8 to 10 to 5 to 10 and linearlysqueezing the range of scores from 0 to 8 to 0 to 5.

Optional Correction Factors

In some embodiments, a correction factor “C” is used for the number ofreviews in a given vertical and locale:

$C = {a + {{b \cdot \frac{2}{\pi}}{\tan^{- 1}( \frac{2 \cdot N_{r}}{\overset{\_}{N_{r}}} )}}}$

where “N_(r)” is the number of reviews for the business and the mediannumber of reviews is taken for the business's vertical and locale. Anexample value for “a” is 0.3 and an example value for “b” is 0.7.

One alternate version of correction factor “C” is as follows:

$C = {a + {{b \cdot \frac{2}{\pi}}{\tan^{- 1}( \frac{2 \cdot N_{r}}{ {{\min( {{\max\mspace{14mu}\overset{\_}{N_{r}}},N_{\min}} )},N_{\max}} )} )}}}$

where “N_(min)” and “N_(max)” are the limits put on the comparator“N_(r)” in the denominator of the argument of the arctan in thecorrection factor. An example value for “N_(min)” is 4 and an examplevalue for “N_(max)” is 20.

A randomization correction “R” can also be used:

$R = {\min( {1000,{{C \cdot B_{norm}} + \frac{{{mod}( {{uid},40} )} - 20}{N_{r}}}} )}$

where “C” is a correction factor (e.g., one of the two discussed above),“B_(norm)” is the normalized base score discussed above, and “uid” is aunique identifier assigned to the business by platform 102 and stored indatabase 214. The randomization correction can be used where only asmall number of reviews are present for a given business.

Another example of “R” is as follows:

R=max(0, C·B _(norm)−37.5·e ^(−0.6·α))

where “α” is the age of the most recent review.

Additional Examples of Scoring Embodiments

As explained above, a variety of techniques can be used by scoringengine 702 in determining reputation scores. In some embodiments, scoresfor all types of businesses are computed using the same sets of rules.In other embodiments, reputation score computation varies based onindustry (e.g., reputation scores for car dealers using one approachand/or one set of factors, and reputation scores for doctors using adifferent approach and/or different set of factors). Scoring engine 702can be configured to use a best in class entity when determiningappropriate thresholds/values for entities within a given industry. Thefollowing are yet more examples of factors that can be used ingenerating reputation scores.

Review volume: The volume of reviews across all review sites can be usedas a factor. For example, if the average star rating and the number ofreviews are high, a conclusion can be reached that the average starrating is more accurate than where an entity has the same average starrating and a lower number of reviews. The star rating will carry moreweight in the score if the volume is above a certain threshold. In someembodiments, thresholds vary by industry. Further, review volume can usemore than just a threshold. For example, an asymptotic function ofnumber of reviews, industry, and geolocation of the business can be usedas an additional scoring factor.

Multimedia: Reviews that have multimedia associated with them (e.g., avideo review, or a photograph) can be weighted differently. In someembodiments, instead of using a separate multimedia factor, the lengthscore of the review is increased (e.g., to the maximum value) whenmultimedia is present.

Review Distribution: The population of reviews on different sites can beexamined, and where a review distribution strays from the meandistribution, the score can be impacted. As one example, if the reviewdistribution is sufficiently outside the expected distribution for agiven industry, this may indicate that the business is engaged in gamingbehavior. The score can be discounted (e.g., by 25%) accordingly. Anexample of advice for improving a score based on this factor would be topoint out to the user that their distribution of reviews (e.g., 200 onsite 110 and only 2 on site 112) deviates from what is expected in theuser's industry, and suggest that the user encourage those who postedreviews to site 110 do so on site 112 as well.

Text Analysis: Text analysis can be used to extract features used in thescore. For example, reviews containing certain key terms (e.g.,“visited” or “purchased”) can be weighted differently than those that donot.

FIG. 8 illustrates an embodiment of a process for generating areputation score. In some embodiments, process 800 is performed byplatform 102. The process begins at 802 when data obtained from each ofa plurality of sites is received. As one example, process 800 begins at802 when Bob logs into platform 102 and, in response, scoring engine 702retrieves data associated with Bob's business from database 214. Inaddition to generating reputation scores on demand, scores can also begenerated as part of a batch process. As one example, scores across anentire industry can be generated (e.g., for benchmark purposes) once aweek. In such situations, the process begins at 802 when the designatedtime to perform the batch process occurs and data is received fromdatabase 214. In various embodiments, at least some of the data receivedat 802 is obtained on-demand directly from the source sites (instead ofor in addition to being received from a storage, such as database 214).

At 804, a reputation score for an entity is generated. Varioustechniques for generating reputation scores are discussed above. Otherapproaches can also be used, such as by determining an average score foreach of the plurality of sites and combining those average scores (e.g.,by multiplying or adding them and normalizing the result). As mentionedabove, in some embodiments the entity for which the score is generatedis a single business (e.g., Bob's Juice Company). The score generated at804 can also be determined as an aggregate across multiple locations(e.g., in the case of ACME Convenience Stores) and can also be generatedacross multiple businesses (e.g., reputation score for the airlineindustry), and/or across all reviews hosted by a site (e.g., reputationscore for all businesses with profiles on site 110). One way to generatea score for multiple locations (and/or multiple businesses) is to applyscoring techniques described in conjunction with FIG. 7 using as inputthe pool of reviews that correspond to the multiplelocations/businesses. Another way to generate a multi-location and/ormulti-business reputation score is to determine reputation scores foreach of the individual locations (and/or businesses) and then combinethe individual scores (e.g., through addition, multiplication, or otherappropriate combination function).

Finally, at 806 the reputation score is provided as output. As oneexample, a reputation score is provided as output in region 604 ofinterface 600. As another example, scoring engine 702 can be configuredto send reputation scores to users via email (e.g., via alerter 432).

Enterprise Reputation Information

As explained above, in addition to providing reputation information forsingle location businesses, such as Bob's Juice Company, platform 102can also provide reputation information for multi-location businesses(also referred to herein as “enterprises”). Examples of enterprisesinclude franchises, chain stores, and any other type of multi-locationbusiness. The following section describes various ways that enterprisereputation information is made available by platform 102 to users, suchas Alice, who represent such enterprises.

FIG. 9 illustrates an example of an interface as rendered in a browser.In particular, Alice is presented with interface 900 after logging in toher account on platform 102 using a browser application on client 108.Alice can also reach interface 900 by clicking on tab option 902. Bydefault, Alice is presented in region 912 with a map of the UnitedStates that highlights the average performance of all ACME locationswithin all states. In various embodiments, other maps are used. Forexample, if an enterprise only has stores in a particular state orparticular county, a map of that state or county can be used as thedefault map. As another example, a multi-country map can be shown as thedefault for global enterprises. Legend 914 indicates the relationshipbetween state color and the aggregate performance of locations in thatstates. Controls 928 allow Alice to take actions such as specifying adistribution list, printing the map, and exporting a CSV file thatincludes the ratings/reviews that power the display.

Presented in region 916 is the average reputation score across all 2,000ACME stores. Region 918 indicates that ACME stores in Alaska have thehighest average reputation score, while region 920 indicates that ACMEstores in Nevada have the lowest average reputation score. A list of thesix states in which ACME has the lowest average reputation scores ispresented in region 922, along with the respective reputation scores ofACME in those states. The reputation scores depicted in interface 900can be determined in a variety of ways, including by using thetechniques described above.

The data that powers the map can be filtered using the dropdown boxesshown in region 904. The view depicted in region 906 will change basedon the filters applied. And, the scores and other information presentedin regions 916-922 will refresh to correspond to the filteredlocations/time ranges. As shown, Alice is electing to view a summary ofall review data (authored in the last year), across all ACME locations.Alice can refine the data presented by selecting one or more additionalfilters (e.g., limiting the data shown to just those locations inCalifornia, or to just those reviews obtained from site 110 that pertainto Nevada locations). The filter options presented are driven by thedata, meaning that only valid values will be shown. For example, if ACMEdoes not have any stores in Wyoming, Wyoming will not be shown indropdown 910. As another example, once Alice selects “California” fromdropdown 910, only Californian cities will be available in dropdown 930.To revert back to the default view, Alice can click on “Reset Filters”(926).

Some of the filters available to Alice (e.g., 908) make use of the tagsthat she previously uploaded (e.g., during account setup). Other filters(e.g., 910) are automatically provided by platform 102. In variousembodiments, which filters are shown in region 904 are customizable. Forexample, suppose ACME organizes its stores in accordance with “Regions”and “Zones” and that Alice labeled each ACME location with itsappropriate Region/Zone information during account setup. Through anadministrative interface, Alice can specify that dropdowns for selecting“Region” and “Zone” should be included in region 904. As anotherexample, Alice can opt to have store manager or other managerdesignations available as a dropdown filter. Optionally, Alice couldalso choose to hide certain dropdowns using the administrativeinterface.

Suppose Alice would like to learn more about the reputation of ACME'sCalifornia stores. She hovers (or clicks) her mouse on region 924 of themap and interface 900 updates into interface 1000 as illustrated in FIG.10, which includes a more detailed view for the state. In particular,pop-up 1002 is presented and indicates that across all of ACME'sCalifornia stores, the average reputation score is 3. Further, out ofthe 24 California cities in which ACME has stores, the stores in TolucaLake, Studio City, and Alhambra have the highest average reputationscores, while the stores in South Pasadena, Redwood City, and NorthHollywood have the lowest average reputation scores. Alice can segmentthe data shown in interface 1000 by selecting California from dropdown1006 and one or more individual cities from dropdown 1004 (e.g., to showjust the data associated with stores in Redwood City).

Alice can view more detailed information pertaining to reviews andratings by clicking tab 932. Interface 1100 makes available, in region1102, the individual reviews collected by platform 102 with respect tothe filter selections made in region 1104. Alice can further refinewhich reviews are shown in region 1102 by interacting with checkboxes1112. Summary score information is provided in region 1106, and thenumber of reviews implicated by the filter selections is presented inregion 1108. Alice can select one of three different graphs to be shownin region 1110. As shown in FIG. 11, the first graph shows how theaverage rating across the filtered set of reviews has changed over theselected time period. If Alice clicks on region 1114, she will bepresented with the second graph. As shown in FIG. 12, the second graphshows the review volume over the time period. Finally, if Alice clickson region 1116, she will be presented with the third graph. As shown inFIG. 13, the third graph shows a breakdown of reviews by type (e.g.,portion of positive, negative, and neutral reviews).

If Alice clicks on tab 934, she will be presented with interface 1400 ofFIG. 14, which allows her to view a variety of standard reports byselecting them from regions 1402 and 1406. Alice can also create andsave custom reports. One example report is shown in region 1404. Inparticular, the report indicates, for a given date range, the averagerating on a normalized (to 5) scale. A second example report is shown inFIG. 15. Report 1500 depicts the locations in the selected data rangethat are declining in reputation most rapidly. In particular, what isdepicted is the set of locations that have the largest negative delta intheir respective normalized rating between two dates. A third examplereport is shown in FIG. 16. Report 1600 provides a summary of ACMElocations in a list format. Column 1602 shows each location's averagereview score, normalized to a 5 point scale. Column 1604 shows thelocation's composite reputation score (e.g., computed using thetechniques described in conjunction with FIG. 7). If desired, Alice caninstruct platform 102 to email reports such as those listed in region1402. In particular, if Alice clicks on tab 940, she will be presentedwith an interface that allows her to select which reports to send, towhich email addresses, and on what schedule. As one example, Alice canset up a distribution list that includes the email addresses of all ACMEboard members and can further specify that the board members shouldreceive a copy of the “Location vs. Competitors” report once per week.

If Alice clicks on tab 936, she will be presented with interface 1700,depicted in FIG. 17. Interface 1700 shows data obtained from platform102 by social sites such as sites 120-122. As with the review data,Alice can apply filters to the social data by interacting with thecontrols in region 1702 and can view various reports by interacting withregion 1704.

Requesting Reviews

If Alice clicks on tab 938, she will be presented with the interfaceshown in FIG. 18, which allows her to send an email request for areview. Once an email has been sent, the location is tracked andavailable in interface 1900, shown in FIG. 19. In the example shown inFIG. 18, Alice is responsible for making decisions such as who torequest reviews from, and how frequently, based on tips provided inregion 1802 (and/or her own intuition). In various embodiments, platform102 includes a review request engine that is configured to assistbusinesses in strategically obtaining additional reviews. In particular,the engine can guide businesses through various aspects of reviewsolicitation, and can also automatically make decisions on the behalf ofthose businesses. Recommendations regarding review requests can bepresented to users in a variety of ways. For example, interface 600 ofFIG. 6 can present a suggestion that additional reviews be requested, ifapplicable. As another example, periodic assessments can be made onbehalf of a business, and an administrator of the business alerted viaemail when additional reviews should be solicited.

FIG. 20 illustrates an embodiment of a reputation platform that includesa review request engine. Platform 2000 is an embodiment of platform 102.Other components (e.g. as depicted in FIGS. 2 and/or 4 as being includedin platform 102) can also be included in platform 2000 as applicable. Aswill be described in more detail below, review request engine 2002 isconfigured to perform a variety of tasks. For example, review requestengine 2002 can determine which sites (e.g., site 110 or site 112) agiven business would benefit from having additional reviews on. Invarious embodiments, platform 102 performs these determinations at leastin part by determining how a business's reputation score would change(whether positive or negative) based on simulating the addition of newreviews to various review sites. Further, review request engine 2002 candetermine which specific individuals should be targeted as potentialreviewers, and can facilitate contacting those individuals, including bysuggesting templates/language to use in the requests, as well as thetiming of those requests.

Targeting Review Placement

As explained above (e.g., in the section titled “Additional Examples ofScoring Embodiments”), one factor that can be considered in determininga reputation score for a business is the “review distribution” of thebusiness's reviews. As one example, suppose a restaurant has a reviewdistribution as follows: Of the total number of reviews of therestaurant that are known to platform 102, 10% of those reviews appearon travel-oriented review site 112, 50% of those reviews appear ongeneral purpose review site 110, and 40% of those reviews appear(collectively) elsewhere. In various embodiments, review request engine2002 is configured to compare the review distribution of the business toone or more target distributions and use the comparison to recommend thetargeting of additional reviews.

A variety of techniques can be used to determine the targetdistributions used by review request engine 2002. For example, as willbe described in more detail below, in some embodiments, reputationplatform 102 is configured to determine industry-specific reviewbenchmarks. The benchmarks can reflect industry averages or medians, andcan also reflect outliers (e.g., focusing on data pertaining to the top20% of businesses in a given industry). Further, for a single industry,benchmarks can be calculated for different regions (e.g., one forRestaurants-West Coast and one for Restaurants-Mid West). The benchmarkinformation determined by platform 102 can be used to determine targetdistributions for a business. Benchmark information can also be providedto platform 102 (e.g., by a third party), rather than or in addition toplatform 102 determining the benchmark information itself. In someembodiments, a universal target distribution (e.g., equal distributionacross all review sites, or specific predetermined distributions) isused globally across all industries.

If a business has a review distribution that is significantly differentfrom a target distribution (e.g., the industry-specific benchmark), the“review distribution” component of the business's reputation score willbe negatively impacted. In various embodiments, review request engine2002 uses a business's review distribution and one or more targetdistributions to determine on which site(s) additional reviews should besought.

FIG. 21 illustrates an embodiment of a process for targeting reviewplacement. In some embodiments process 2100 is performed by reviewrequest engine 2002. The process begins at 2102 when an existingdistribution of reviews for an entity is evaluated across a plurality ofreview sites. A determination is made, at 2104, that the existingdistribution should be adjusted. Finally, at 2106, an indicator of atleast one review site on which placement of at least one additionalreview should be targeted is provided as output.

One example of process 2100 is as follows: Once a week, the reviewdistribution for a single location dry cleaner (“Mary's Dry Cleaning”)is determined by platform 102. In particular, it is determined thatapproximately 30% of Mary's reviews appear on site 110, approximately30% appear on site 112, and 40% of Mary's reviews appear elsewhere(2102). Suppose a target distribution for a dry cleaning business is:70% site 110, 10% site 112, and 20% remainder. Mary's reviewdistribution is significantly different from the target, and so, at 2104a determination is made that adjustments to the distribution should besought. At 2106, review request engine 2002 provides as output anindication that Mary's could use significantly more reviews on site 110.The output can take a variety of forms. For example, platform 102 cansend an email alert to the owner of Mary's Dry Cleaning informing herthat she should visit platform 102 to help correct the distributionimbalance. As another example, the output can be used internally toplatform 2002, such as by feeding it as input into a process such asprocess 2500.

As will be described in more detail below, in some embodiments, thetarget distribution is multivariate, and includes, in addition to aproportion of reviews across various sites, information such as targettimeliness for the reviews, a review volume, and/or a target averagescore (whether on a per-site basis, or across all applicable sites).Multivariate target distributions can also be used in process 2100. Forexample, suppose that after a few weeks of requesting reviews (e.g.,using process 2100), the review distribution for Mary's Dry Cleaning is68% site 110, 12% site 112, and 20% remainder (2102). The siteproportions in her current review distribution are quite close to thetarget. However, other aspects of her review distribution maynonetheless deviate significantly from aspects of a multivariate targetand need adjusting to bring up her reputation score. For example, theindustry target may be a total of 100 reviews (i.e., total reviewvolume) and Mary's Dry Cleaning may only have 80 total reviews. Or, theindustry target average age of review may be six months, while theaverage age for Mary's Dry Cleaning is nine months. Decisions made at2104 to adjust the existing review distribution can take into accountsuch non-site-specific aspects as well. In some embodiments theseadditional aspects of a target distribution are included in thedistribution itself (e.g., within a multivariate distribution). In otherembodiments, the additional information is stored separately (e.g. in aflat file) but is nonetheless used in conjunction with process 2100 whendetermining which sites to target for additional reviews. Additionalinformation regarding multivariate distribution targets is providedbelow (e.g., in the section titled “Industry Review Benchmarking”).

Another example of process 2100 is as follows: Once a week, the reviewdistribution of each location of a ten-location franchise is determined(2102). Comparisons against targets can be done individually on behalfof each location, e.g., with ten comparisons being performed against asingle, industry-specific target. Comparisons can also be performedbetween the locations. For example, of the ten locations, the locationhaving the review distribution that is closest to the industry-specifictarget can itself be used to create a review target for the otherstores. The review distributions of the other stores can be comparedagainst the review distributions of the top store, instead of or inaddition to being compared against the industry target.

In some embodiments, additional processing is performed in conjunctionwith process 2100. For example, as part of (or prior to) portion 2102 ofthe process, a determination can be made as to whether or not the entityhas a presence on (e.g., has a registered account with) each of thesites implicated in the target distribution. If an entity is expected tohave a non-zero number of reviews on a given site (in accordance withthe target distribution), having a presence on that site is needed. Asone example, a car dealer business should have an account on review site114 (a car dealer review site). A restaurant need not have an account onthe site, and indeed may not qualify for an account on the site. If thecar dealer business does not have an account with site 114, a variety ofactions can be taken by platform 102. As one example, an alert that thecar dealer is not registered with a site can be emailed to anadministrator of the car dealer's account on platform 102. As anotherexample, the output provided at 2106 can include, e.g., in a prominentlocation, a recommendation that the reader of the output register for anaccount with site 114. In some embodiments, platform 102 is configuredto register for an account on (or otherwise obtain a presence on) thesite, on behalf of the car dealer.

Industry Review Benchmarking

As discussed above, review request engine 2002 can use a variety oftarget distributions, obtained in a variety of ways, in performingprocess 2100. Two examples of target distributions are depicted in FIGS.22 and 23, respectively.

The target distributions shown in FIG. 22 are stored as groups of lines(2202, 2204) in a single flat file, where an empty line is used as adelimiter between industry records. The first line (e.g., 2206)indicates the industry classification (e.g., Auto Dealership). Thesecond line (e.g., 2208) indicates a target review volume across allwebsites (e.g., 80). The third line (e.g., 2210) indicates the industryaverage review rating, normalized to a 5 point scale (e.g., 3.5). Thefourth line (e.g., 2212) indicates for how long of a period of time areview will be considered “fresh” (e.g., 1 year) and thus count in thecalculation of a business in that industry's reputation score. In someembodiments, in addition to or instead of a specific freshness value, adecay factor is included, that is used to reduce the impact of aparticular review in the calculation of a business's reputation scoreover time. The remaining lines of the group (2214-2218) indicate whatpercentage of reviews should appear on which review sites. For example,40% of reviews should appear on general purpose review site 110; 10% ofreviews should appear on travel review site 112; and 50% of reviewsshould appear on a review site focused on auto dealers.

As shown in FIG. 22, different industries can have different values intheir respective records. For example, a target review volume forrestaurants is 100 (2220), the industry average review rating is 4(2222), and the freshness value is two years (2224). The target reviewdistribution is also different.

The target distributions depicted in FIG. 22 can be used to model theimpact that additional reviews would have for a business. For example,for a given car dealer business, simulations of additional reviews(e.g., five additional positive reviews obtained on site 110 vs. threeadditional positive reviews obtained on site 112) can be run, and amodeled reputation score (e.g., using techniques described in “ExampleScore Generation” above) determined. Whichever simulation results in thehighest reputation score can be used to generate output at 2106 inprocess 2100.

FIG. 23 illustrates another example of a target distribution. For agiven business, the first two columns of table 2300 list an industry(2302) and sub-industry (2304). The next column lists the target reviewvolume (2306). The remaining columns provide target review proportionswith respect to each of sites 2308-2324. As shown in FIG. 23, many ofthe cells in the table are empty, indicating that, for a given type ofbusiness, only a few review sites significantly impact the reputationsof those businesses. For example, while car dealers and car rentalbusinesses are both impacted by reviews on sites 110-114 (2308-2312),reviews on site 2322 (a dealer review site) are important to cardealers, but not important to car rental businesses (or entirelydifferent industries, such as restaurants). As another example, reviewsof hospitals appearing on a health review site 2314 are almost asimportant as reviews appearing on site 110. However, reviews appearingon site 2314 are considerably less important to elder care businesses,while reviews on a niche nursing review site 2318 matter for nursinghomes but not hospitals.

A small subset of data that can be included in a distribution (alsoreferred to herein as an industry table) is depicted in FIG. 23. Invarious embodiments, hundreds of rows (i.e., industries/sub-industries)and hundreds of columns (i.e., review sites) are included in the table.Further, additional types of information can be included in table 2300,such as freshness values, review volume over a period of time (e.g.,three reviews per week), decay factors, average scores, etc.

As previously explained, target distributions can be provided toplatform 102 in a variety of ways. As one example, an administrator ofplatform 102 can manually configure the values in the file depicted inFIG. 22. As another example, the top business in each category (i.e.,the business having the highest reputation score) can be used as amodel, and its values copied into the appropriate area of file depictedin FIG. 22, whether manually or programmatically. As yet anotherexample, process 2400 can be used to generate target distribution 2300.

FIG. 24 illustrates an embodiment of a process for performing anindustry review benchmark. In some embodiments, process 2400 isperformed by industry benchmarking module 2006 to create/maintainindustry table 2300. For example, benchmarking module 2006 can beconfigured to execute process 2400 once a month. Benchmarking module2006 can also execute process 2400 more frequently, and/or can executeprocess 2400 at different times with respect to different industries(e.g., with respect to automotive industries one day each week and withrespect to restaurants another day each week), selectively updatingportions of table 2300 instead of the entire table at once. In someembodiments, process 2400 is performed multiple times, resulting inmultiple tables. For example, platform 102 can be configured to generateregion-specific tables.

The process begins at 2402 when review data is received. As one example,at 2402, industry benchmarker 2006 queries database 214 for informationpertaining to all automotive sales reviews. For each automotive salesbusiness (e.g., a total of 16,000 dealers), summary information such aseach dealer's current reputation score, current review distribution, andcurrent review volume is received at 2402.

At 2404, the received data is analyzed to determine one or morebenchmarks. As one example, benchmarker 2006 can be configured toaverage the information received at 2402 into a set of industry averageinformation (i.e., the average reputation score for a business in theindustry; the averaged review distribution; and the average reviewvolume). Benchmarker 2006 can also be configured to consider only aportion of the information received at 2402 when determining abenchmark, and/or can request information for a subset of businesses at2402. As one example, instead of determining an industry average at2404, benchmarker 2006 can consider the information pertaining to onlythose businesses having reputation scores in the top 20% of the industrybeing benchmarked. In some embodiments, multiple benchmarks areconsidered (e.g., in process 2100) when making determinations. Forexample, both an industry average benchmark, and a “top 20%” benchmarkcan be considered (e.g., by being averaged themselves) when determininga target distribution for a business.

In some embodiments, additional processing is performed at 2404 and/oroccurs after 2404. For example, a global importance of a review site(e.g., its Page Rank or Alexa Rank) is included as a factor in thetarget distribution, or is used to weight a review site's values intable 2300.

In various embodiments, the industry benchmarked during process 2400 issegmented and multiple benchmarks are determined (e.g., one benchmarkfor each segment, along with an industry-wide benchmark). As oneexample, suppose the industry being benchmarked is Fast FoodRestaurants. In some embodiments, in addition to an industry-widebenchmark, benchmarks are determined for various geographic sub-regions.One reason for performing regional benchmarking is that differentpopulations of people may rely on different review websites for reviewinformation. For example, individuals on the West Coast may rely heavilyon site 112 for reviews of restaurants, while individuals in the MidWest may rely heavily on a different site. In order to improve itsreputation score, a restaurant located in Ohio will likely benefit froma review distribution that more closely resembles that of other MidWestern restaurants than a nationwide average distribution.

Reviewer Recommendation

FIG. 25 illustrates an embodiment of a process for recommendingpotential reviewers. In some embodiments, process 2500 is performed byreview request engine 2002. The process begins at 2502 when a list ofpotential reviewers is received. The list can be received in a varietyof ways. As one example, a list of potential reviewers can be receivedat 2502 in response to, or in conjunction with, the processing performedat 2106. As another example, a business, such as a car dealership, canperiodically provide platform 102 a list of new customers (i.e., thosepeople who have recently purchased cars) including those customers'email addresses (at 2502). As yet another example, a business canprovide to platform 102 a comprehensive list of all known customers(e.g., those subscribed to the business's email newsletters and/orgleaned from past transactions). In some embodiments, customer emailaddresses are stored in database 214 (2008), and a list of reviewers isreceived at 2502 in response to a query of database 214 being performed.

At 2504, a determination is made that at least one individual on thereceived list should be targeted with a review request. A variety oftechniques can be used to make this determination. As one example, allpotential reviewers received at 2502 could be targeted (e.g., becausethe list received at 2502 includes an instruction that all members betargeted). As another example, suppose as a result of process 2100, adetermination was made that a business would benefit from more reviewson Google Places. At 2504, any members of the list received at 2502 thathave Google email addresses (i.e., @gmail.com addresses) are selected at2504. One reason for such a selection is that the individuals with@gmail.com addresses will be more likely to write reviews on GooglePlaces (because they already have accounts with Google). A similardetermination can be made at 2504 with respect to other domains, such asby selecting individuals with @yahoo.com addresses when additionalreviews on Yahoo! Local are recommended.

Whether or not an individual has already registered with a review sitecan also be determined (and therefore used at 2504) in other ways aswell. For example, some review sites may provide an API that allowsplatform 102 to confirm whether an individual with a particular emailaddress has an account with that review site. The API might return a“yes” or “no” response, and may also return a user identifier ifapplicable (e.g., responding with “CoolGuy22” when presented with aparticular individual's email address). As another example, where thesite does not provide such an API, a third party service may supplymappings between email addresses and review site accounts to platform102. As yet another example, the automobile dealer could ask thepurchaser for a list of review sites the user has accounts on and/or canpresent the customer with a list of review sites and ask the customer toindicate which, if any, the customer is registered with.

In various embodiments, any review site accounts/identifiers determinedto be associated with the customer are stored in database 214 in aprofile for the individual. Other information pertinent to theindividual can also be included in the profile, such as the number ofreviews the user has written across various review sites, the averagerating per review, and verticals (e.g., health or restaurants)associated with those reviews.

Additional/alternate processing is performed at 2504 in variousembodiments. As one example, database 214 can be queried for informationpertaining to each of the potential reviewers received at 2502 and ananalysis can be performed on the results. Individuals who have a historyof writing positive reviews in general, of writing positive reviews inthe same vertical, of writing positive reviews in a different vertical,of frequently writing reviews, of writing high quality reviews (e.g.,having a certain minimum length or including multimedia) irrespective ofwhether the review itself is positive, can be selected. Individuals withno histories and/or with any negative aspects to their review historiescan be removed from consideration, as applicable. In some embodiments,an examination of the potential reviewer (e.g., an analysis of his orher existing reviews) is performed on demand, in conjunction with theprocessing of 2504. In other embodiments, reviewer evaluations areperformed asynchronously, and previously-performed assessments (e.g.,stored in database 214) are used in evaluating potential reviewers at2504.

In various embodiments, review request engine 2002 is configured topredict a likelihood that a potential reviewer will author a review andto determine a number of reviews to request to arrive at a target numberof reviews. For example, suppose a company would benefit from anadditional five reviews on site 110 and that there is a 25% chance thatany reviewer requested will follow through with a review. In someembodiments, engine 2002 determines that twenty requests should be sent(i.e., to twenty individuals selected from the list received at 2502).Further, various thresholding rules can be employed by platform 102 whenperforming the determination at 2504. For example, a determination mayhave been made (e.g., as an outcome of process 2100) that a businesswould benefit from fifty additional reviews being posted to site 110.However, it may also be the case that site 110 employs anti-gamingfeatures to identify and neutralize excessive/suspicious reviews. Insome embodiments, platform determines limits on the number of requeststo be made and/or throttles the rate at which they should be made at2504.

At 2506, transmission of a review request to a potential reviewer isfacilitated. The processing of 2506 can be performed in a variety ofways. As one example, all potential reviewers determined at 2504 can beemailed identical review request messages by platform 102, in accordancewith a template 2010 stored on platform 102. Information such as thename of the business to be reviewed, and the identity of each potentialreviewer is obtained from database 214 and used to fill in appropriatefields of the template. In various embodiments, different potentialreviewers of a given business receive different messages from platform102. For example, the message can include a specific reference to one ormore particular review site(s), e.g., where the particular reviewer hasan account. Thus one potential reviewer might receive a messageincluding the phrase, “please review us on Site 110,” while anothermight receive a message including the phrase, “please review us on Site112.” In various embodiments, multiple review sites are mentioned in therequest, and the position of the respective site varies across thedifferent requests sent to different potential reviewers. For example,the request can include a region such as region 1804 as depicted in FIG.18. The ordering of the sites can be based on factors such as theconcentration of new reviews needed to maximize a business's scoreincrease, and/or factors such as where the potential reviewer alreadyhas an account and/or is otherwise most likely to complete a review.

Where statistical information is known about the potential reviewer(e.g., stored in database 214 is information that the reviewer typicallywrites reviews in the evening or in the morning), that information canbe used in conjunction with facilitating the transmission of the reviewrequest (e.g., such that the review is sent at the time of day mostlikely to result in the recipient writing a review). Where statisticalinformation is not known about the specific potential reviewer,statistical information known about other individuals can be used fordecision-making. Different potential reviewers can also be providedmessages in different formats. For example, some reviewers can beprovided with review request messages via email, while other reviewerscan be provided with review requests via social networking websites, viapostal mail, or other appropriate contact methods.

In various embodiments, A/B testing is employed by platform 102 inmessage transmission. For example, a small number of requests can besent—some at one time of day and the others at a different time of day(or sent on different days of week, or with different messaging).Follow-up engine 2004 can be configured to determine, after a period oftime (e.g., 24 hours) how many of the targeted reviewers authoredreviews, and to use that information as feedback in generating messagesfor additional potential reviewers. Other information pertaining to themessage transmission (and its reception) can also be tracked. Forexample, message opens and message click throughs (and their timing) canbe tracked and stored in database 214 (2012).

Follow-Up Determination

FIG. 26 illustrates an embodiment of a process for determining afollow-up action. In some embodiments, process 2600 is performed byplatform 102. The process begins at 2602 when a transmission of a reviewrequest is facilitated. In some embodiments, portion 2506 of process2500, and portion 2602 of process 2600 are the same.

At 2604, a determination is made that the potential reviewer, to whomthe review request was transmitted at 2602, has not responded to therequest by creating a review. In some embodiments, portion 2604 ofprocess 2600 is performed by follow-up engine 2004. As one example, whenan initial review request is sent (e.g., at 2506), information (2012)associated with that request is stored in database 214. Follow-up engine2004 periodically monitors appropriate review sites to determine whetherthe potential reviewer has created a review. If engine 2004 determinesthat a review was authored, in some embodiments, no additionalprocessing is performed by follow-up engine 2004 (e.g., beyond notingthat a review has been created and collecting statistical informationabout the review, such as the location of the review, and whether thereview is positive or negative). In other embodiments, platform 102takes additional actions, such as by sending the reviewer a thank youemail. In the event it is determined that no review has been created(2604), follow-up engine 2004 determines a follow-up action to takeregarding the review request.

A variety of follow-up actions can be taken, and cam be based on avariety of factors. As one example, follow-up engine 2004 can determine,from information 2012 (or any other appropriate source), whether thepotential reviewer opened the review request email. The follow-up enginecan also determine whether the potential reviewer clicked on any linksincluded in the email. Follow-up engine 2004 can select differentfollow-up actions based on these determinations. For example, if thepotential reviewer did not open the email, one appropriate follow-upaction is to send a second request, with a different subject line (i.e.,in the hopes the potential reviewer will now open the message). If thepotential reviewer opened the email, but didn't click on any links, analternate message can be included in a follow-up request. If thepotential reviewer opened the email and clicked on a link (but did notauthor a review) another appropriate action can be selected by follow-upengine 2004 as applicable, such as by featuring a different review site,or altering the message included in the request. Another example of afollow-up action includes contacting the potential reviewer using adifferent contact method than the originally employed one. For example,where a request was originally sent to a given potential reviewer viaemail, follow-up engine 2004 can determine that a follow-up request besent to the potential reviewer via a social network, or via a physicalpostcard. Another example of a follow-up action includes contacting thepotential reviewer at a different time of day than was employed in theoriginal request (e.g., if the request was originally sent in themorning, send a follow-up request in the evening).

In various embodiments, follow-up engine 2004 is configured to determinea follow-up schedule. For example, based on historical information(whether about the potential reviewer, or based on informationpertaining to other reviewers), follow-up engine 2004 may determine thata reminder request (asking that the potential reviewer write a review)should be sent on a particular date and/or at a particular time toincrease the likelihood of a review being authored by the potentialreviewer. Follow-up engine can also determine other schedulingoptimizations, such as how many total times requests should be madebefore being abandoned, and/or what the conditions are for ceasing toask the potential reviewer for a review. In various embodiments, ABtesting is employed (e.g., with respect to a few potential reviewersthat did not write reviews) by follow-up engine 2004 to optimizefollow-up actions.

FIG. 27 illustrates a portion of an interface as rendered in a browser.In particular, interface 2700 provides feedback (e.g., to a businessowner) regarding two six-week periods of a review request campaign thatincludes follow-up. As shown, the current campaign has led toapproximately twice as many “click throughs” (2702) while not resultingin any additional “opt-outs” (2704). Further, the current campaign hasresulted in nearly triple the number of reviews (2706) being written.

Stimulating Reviews at a Point of Sale

One problem for some businesses, such as fast food restaurants, is thatvisiting such restaurants and receiving the expected quality ofservice/food is sufficiently routine/mundane that most people will notbother to write a positive review of their experience on a site, such assite 112. Only where people experience a significant problem will theybe sufficiently motivated to author a review, leading to the overallreview that is likely unfairly negative.

FIG. 28 illustrates an embodiment of a process for stimulating reviews.In some embodiments, process 2800 is performed on a device (e.g., onehaving interface 2900). The process begins at 2802 when a user isprompted to provide a review at a point of sale. In various embodiments,businesses make available devices that visitors can use to providefeedback while they are at the business. For example, a visitor can behanded a tablet and asked for feedback prior to leaving. As anotherexample, a kiosk can be placed on premise and visitors can be asked tovisit and interact with the kiosk.

Illustrated in FIG. 29 is an interface 2900 to such devices. In region2902, the visitor is asked to provide a rating. In region 2904, thevisitor is asked to provide additional feedback. And, in region 2906,the visitor is asked to provide an email address and identify otherinformation, such as the purpose of the visitor's visit. In region 2908,the visitor is offered an incentive for completing the review (but isnot required to provide a specific type of review (e.g., positivereview)). When the visitor has completed filling out the informationasked in interface 2900, the user is asked to click button 2910 tosubmit the review. When the visitor clicks button 2910, the devicereceives the review data (at 2804 of process 2800). Finally, at 2806,the device transmits the visitor's review data to platform 102.

In various embodiments, platform 102 is configured to evaluate thereview data. If the review data indicates that the visitor is unhappy(e.g., a score of one or two), a remedial action can be taken,potentially while the visitor is still in the store. For example, amanager can be alerted that the visitor is unhappy and can attempt tomake amends in person. As another example, the manager can write to thevisitor as soon as possible, potentially helping resolve/diffuse thevisitor's negativity prior to the visitor reaching a computer (e.g., athome or at work) and submitting a negative review to site 112. Invarious embodiments, platform 102 is configured to acceptbusiness-specific rules regarding process 2900. For example, arepresentative of a business can specify that, for that business,“negative” is a score of one through three (i.e., including neutralreviews) or that a “positive” is a score of 4.5 or better. The businesscan also specify which actions should be taken—e.g., by having a manageralerted to positive reviews (not just negative reviews).

If the review data indicates that the visitor is happy (e.g., a score offour or five), a different action can be taken. As one example, platform102 can automatically contact the visitor (via the visitor'sself-supplied email address), provide a copy of the visitor's reviewinformation (supplied via interface 2900), and ask that the visitor postthe review to a site such as site 110 or site 112. As another example,if the visitor is still interacting with the device at the time,platform 102 can instruct the device to ask the visitor for permissionto post the review on the visitor's behalf. As needed, the device,and/or platform 102 can facilitate the posting (e.g., by obtaining theuser's credentials for a period of time).

Themes

In various embodiments, techniques described herein are used to identifyproducts, services, or other aspects of a business that reviewersperceive positively or negatively. These perceptions are also referredto herein as “themes.” One example of a theme is “rude.” Another exampleof a theme is “salty fries.”

FIG. 30 illustrates an example of an interface as rendered in a browser.In particular, interface 3000 is an embodiment of a dashboard display(e.g., displayed to Alice when she clicks on link 3002). As will bedescribed in more detail below, a variety of techniques can be used todetermine themes that are common across reviews, as well as theirsentiment (e.g., positive, negative, or neutral). In variousembodiments, system 102 is configured to use a rating accompanying areview when assigning sentiment, rather than (or in addition to) anunderlying connotation of a term.

As one example, the phrase, “sales tactics” might carry a negative (orneutral) connotation in typical conversational use. If an author of afive (out of five) star review uses the expression, however, the authoris likely indicating that “sales tactics” were a positive thingencountered about the business being reviewed. As another example, theterm, “rude,” has a negative connotation in typical conversational use.Its presence in a five star review can indicate that rudeness at a givenestablishment is not a problem. As yet another example, the term,“cheap,” can have a positive or neutral connotation (e.g., indicatingsomething is inexpensive) but can also have a negative connotation(e.g., “cheap meat” or “cheap quality”). A rating accompanying a reviewcan be used to determine whether “cheap” is being used as a pejorativeterm. As yet another example, the phrase, “New Mexico is not known forits sushi,” would typically be considered to express a negativesentiment (e.g., when analyzed using traditional sentiment analysistechniques). Where the phrase appears in a 5 star review, however, theauthor is likely expressing delight at having found a good sushirestaurant in New Mexico. Using the techniques described herein, thereview author's sentiment (positive) will accurately be reflected indetermining sentiment for a theme, such as “food” for the sushirestaurant being reviewed.

In the example shown in FIG. 30, Alice is viewing an overview map of allACME stores that indicates how the stores are perceived with respect tocustomer service. In some embodiments, each of the headings included inregion 3036 is an example of a theme (e.g., “Environment” and “Speed”).In other embodiments, themes are the most common terms with respect to agiven category (e.g., with “Knowledgeable” and “Rude” being examples ofthemes in the category of customer service). In some embodiments, boththe keywords, and any parents of the keywords in a hierarchy areconsidered to be themes—with some themes being more specific (e.g.,“dirty floor”) than others (e.g. “cleanliness”).

As indicated in region 3004, across all of the 2,000 ACME stores in theUnited States, the staff at ACME is perceived positively as being nice(3006), knowledgeable (3008), and providing a good returns process(3010). The areas in which ACME is perceived most negatively (withrespect to customer service) are that the staff is rude (3012), thecheckout process has issues (3014), and that the employees are too busy(3016). The positive and negative terms listed in region 3004 areexamples of themes having their indicated respective sentiments.

If Alice clicks on region 3020, she will see the most prevalent positiveand negative terms associated with the value provided by ACME. If sheclicks on region 3018, she will see the most prevalent positive andnegative overall terms associated with ACME, across all reviews. In someembodiments, the types of themes that are presented in interface 3000are pre-selected—whether based on a template, based on the selections ofan administrator, or otherwise selected, such as based on the industryof the reviewed entity. A car dealership, for example, can be evaluatedwith respect to “parts department” oriented themes, while a restaurantcan be evaluated with respect to “food” oriented themes (withoutevaluating the restaurant with respect to parts or the dealership withrespect to food). Both types of business can be evaluated with respectto common business elements (e.g., “cleanliness” and/or “value”). Asanother example, Alice can customize which types of themes are presentedin interface 3000. In other embodiments, which themes are presented ininterface 3000 depends, at least in part, on the review informationassociated with the entity. For example, as will be described in moredetail below, themes can be organized into hierarchies. Those themes inthe hierarchy that are more prevalent in reviews can be surfacedautomatically in addition to/instead of being included (e.g., in region3036) by default.

Interface 3000 depicts, in region 3022, the top rated states (withrespect to customer service) and the most common positive (3024) andnegative (3026) terms that appear in their respective reviews. If Aliceclicks on icon 3038, the bottom ranked states (and their terms) will bedisplayed first.

Map 3028 depicts, based on color, whether the stores in a given stateare viewed, with respect to customer service, positively (e.g., 3030),negatively (e.g., 3032), or neutrally (e.g., 3034). Suppose Alice clickson California (3032). She will then be presented with interface 3100 asillustrated in FIG. 31, which includes a more detailed view for the ACMEstores in that state. As with region 3036 of FIG. 30, region 3102depicts summary information with respect to overall perception (3104),and perception within six specific areas (3106). In particular, region3104 shows that ACME's California stores are ranked 39^(th) in thecountry, and that overall, the most positive aspects of the Californiastores are that shopping at them is fast and convenient, and that thestores have a good selection. Overall, the most negative aspects of theCalifornia stores is that employees are rude, shoppers are kept waiting,and the stores are dirty.

In region 3108, the highest ranked stores in California are listed,along with their respective most prevalent positive and negative terms.If Alice clicks on icon 3110, the worst ranked stores will be listedfirst. Alice can see the individual reviews mentioning a given term, fora given store, by clicking on the term shown in region 3108. As oneexample, suppose Alice would like to see the reviews that mentionedACME's “friendly” clerks at the store located on Highway 1. She clickson region 3112 and is presented with the popup displayed in interface3200 in FIG. 32.

According to region 3206 of interface 3200, a total of 21 reviews of theACME store located at 140 Highway 1 in California contain the word“friendly.” The reviews are sorted in reverse date order, and the term,“friendly,” is highlighted in each review (e.g., at 3202 and 3204).

In some cases, particularly where information for a specific location isreviewed, surprising results may occur. As one example, a given storemay have an employee (e.g., “Jeff”) who is mentioned multiple times inreviews. Using the techniques described herein (e.g., the NLP processingtechniques described below), keywords such as “Jeff” will surface asthemes. Where the theme has a positive sentiment, this can indicate thatJeff is a great employee. Where the theme has a negative sentiment, thiscan indicate that Jeff is a problematic employee. As will be describedin more detail below, smoothing techniques can be applied so that wherea company has received only a handful of reviews about Jeff, he will notsurface as a “theme.” As another example, in most parts of the UnitedStates, a review of a hotel or an apartment that includes the word,cockroach, is highly likely to be expressing negative sentiment. Typicalpeople only think about/mention cockroaches when they have had anegative experience. In the SouthEast, however, the mere presence of theterm, cockroach, does not mean that the reviewer is authoring a negativereview. In a region full of palmetto bugs, the author might becommenting favorably on how the hotel manager or landlord has managedthe presence of such creatures.

FIG. 33 illustrates an alternate example of a popup display of reviewsincluding a term. In particular, interface 3300 shows, to anadministrator of a car dealership franchise's account on platform 102,reviews at various locations that include the term, “tactic.” Asindicated by the star ratings accompanying the reviews, the term,“tactic” is present in both positive (e.g., 3302) and negative (e.g.,3304) reviews.

Returning to FIG. 30, if Alice clicks on tab 3040, she will be presentedwith interface 3400 as shown in FIG. 34. Interface 3400 displays, foreach ACME store, numerical indications of each store's average ratingwith respect to each theme (or category of themes, as applicable). IfAlice clicks on tab 3042 of interface 3400, she will see ACME's datacompared against the data of competitor convenience stores. In variousembodiments, Alice can specify what types of competitor data should beshown. For example, Alice can compare ACME's ratings with respect togiven themes against industry averages and/or against specificcompetitors. This can be particularly insightful in certain industries,such as telephone carriers, or airlines, where people frequently writereviews only when they are upset. Themes of “broken charger” or “lostbaggage” are likely to be surfaced, with negative sentiment, for anybusiness in the industry. Being able to determine whether the number ofcomplaints/severity of negative sentiment pertaining to baggage handlingis higher or lower than as compared to complaints made about competitorsmay be more useful to a representative of a company than merely knowingthat people are unhappy about a given aspect.

Further, Alice can specify location constraints on the competitorinformation—such as by specifying that she would like to compare allACME stores against competitor stores in Denver. She can also specifythat she would like to compare ACME California stores against theindustry average in California (or the industry average in Texas). Insome embodiments, additional tabs are included in interface 3400, forexample, ones allowing Alice to compare ACME stores against one another(e.g., based on geography) and also to compare the same stores over time(e.g., determining what the most positively and negatively perceivedthemes were in one year vs. another for a store, a group of stores,and/or competitor/industry information).

Returning to FIG. 31, if Alice clicks on one of the addresses listed incolumn 3114, she will be presented with interface 3500 as shown in FIG.35. Interface 3500 displays, for the specific ACME store she clicks on,the top positive terms and negative terms for the store (across each ofthe themes), associated reviews, and scores. Additional information isalso presented, such as the store's rank across all other ACME stores(3502).

Assigning Sentiment to Themes

FIG. 36 illustrates an embodiment of a process for assigning sentimentto themes. In some embodiments, process 3600 is performed by themeengine 434. In portions of the following discussion, a single reviewwill be described. However, portions of process 3600 can be repeatedwith respect to several, or all, reviews of an entity, whether inparallel, or in sequence. The process begins at 3602 when reputationdata is received. In particular, a review having text and anaccompanying score is received at 3602. One example of review text is,“The toiletries are the best thing at Smurfson Hotels,” with a scoreprovided by the author of the review of 5.

In some embodiments, reputation data is received by system 102 inconjunction with the processing performed at 506 in process 500. In thisscenario, process 3600 is performed when/as data is ingested into system102. In some embodiments, process 3600 is performed asynchronously toprocess 500. For example, process 3600 can be performed nightly, weekly,or in response to an arbitrary triggering event (examples of which aredescribed above in conjunction with discussion of FIGS. 4 and 5).

At 3604, a determination of one or more keywords is made, using thereview's text. A variety of techniques can be used to make thedetermination at 3604. As one example, every word in the review (i.e.,“The,” “toiletries,” “are,” . . . ) can be treated as a distinctkeyword. As another example, varying amounts of natural languageprocessing (NLP) can be employed. For example, articles or other partsof speech can be skipped, only those words that are nouns and adjectivescan be extracted as keywords, stemming/normalization can be applied,etc. Additional detail regarding the use of NLP in various embodimentsis described in more detail below.

In some embodiments, ontologies 436 are used in determining keywords at3604. Ontologies can be created by an administrator, obtained from athird party (e.g., a parts listing), and/or can be at least partiallyautomatically generated from existing review data (e.g., by performingterm frequency analysis, NLP, etc.). In some embodiments, users ofsystem 102 can customize/supplement the ontologies used. For example, ifa particular business offers trademarked products for sale, thosetrademarked goods can be included in an ontology associated with thatbusiness. As another example, a master set of terms can be used (e.g.,for all/major business types), and refinement sets combined with themaster set as applicable (e.g., refinements for hotels; refinements forrestaurants). In some cases, such refinements may be added to the masterset(s) and used for processing reviews. In other cases, some refinementsmay override portions of the master set(s). As yet another example,blacklists (whether global, industry specific, or specific to a givencompany) can be used to exclude certain terms from consideration askeywords at 3604. Examples of excerpts of ontologies are depicted inFIGS. 37A and 37B.

FIG. 37A is an excerpt of an ontology for use in processing reviews ofmedical practices. The ontology includes substitutions (e.g., synonymsand typo corrections), and is hierarchical. For example, if a revieweruses the term “physician,” “doc,” “MD,” or “docktor,” in a review(3702), theme engine 434 will substitute the term, “doctor” in itsprocessing (i.e., as if the author had used the term, doctor).Substitutions are indicated in FIG. 37A as pairs where the right itemappears in lowercase. In the case of an ontology for a car dealer, termssuch as “car,” “cars,” “automobile,” “automobiles,” and “autos,” couldsimilarly be collapsed.

Other terms are not necessarily synonyms (though they can be), but referto or are associated with the same concept within a hierarchy (alsoreferred to herein as a “category” and a “type of theme”). As oneexample, review comments that refer to the “lobby,” “reception,”“waiting area,” and “magazines” (3704) each refer to an aspect the frontportion of a medical practice. As another example (not shown), the terms“price,” “bargain,” “ripoff,” “cost,” “charged,” and “bill” can all betreated as references to the value provided by a business.

The hierarchical relationship between terms in the ontology is indicatedin FIG. 37A as pairs where the right item is denoted in uppercase. Asshown in region 3706, any reviews pertaining to “PARKING,” “BATHROOM,”or “LOBBY,” pertain (more generally) to the “ENVIRONMENT” of a medicalpractice.

FIG. 37B is an excerpt of an ontology for use in processing reviews of aspecific restaurant. Some of the terms associated with the “FOOD”category are common ingredients, such as “mayo” (3708) and “pickle”(3710). Other entries are generic names for menu items such as “applepie” (3712) and yet other entries are trademarked names for items uniqueto the specific restaurant, such as “BlueCool” and “SpiffBurger” (3714).Yet other “FOOD” words are not nouns, but are instead adjectives thatreflect how people perceive food, such as that it is “bland,” “burnt,”“salty,” and “watery” (3716). The remaining examples of “FOOD” wordsshown in FIG. 37B are even more conceptual, such as “addictive” and“artery” (clogging) (3718). Terms associated with other categories arealso shown, such as terms pertaining to the environment at therestaurant and the service provided by the restaurant. Note that in somecases, antonyms are included in the ontology. For example, both “clean”and “dirty” (3720 and 3722) are categorized as pertaining to“ENVIRONMENT.” And, both “polite” and “rude” (3724 and 3726) arecategorized as pertaining to “SERVICE.”

The lists of words included in ontologies 37A and 37B are exampleexcerpts. In practice, ontologies can include significantly more terms.As one example, an ontology for use with car repair businesses couldinclude, by name, every part of a car (e.g., to help analyze reviewsreferring to specific parts, such as “my gasket broke,” or “I needed areplacement carburetor”). Further, the same term can be differentlyassociated with different themes, such as based on industry usage. Asone example, “patient” in the ontology of FIG. 37A (3730) is placed in a“PATIENT” hierarchy—referring to the customer of a doctor. “Patient” inthe ontology of FIG. 37B (3728) is placed in the “SERVICE”hierarchy—referring to the patience of staff (or the patience ofpatrons).

Returning to the process of FIG. 36, once keywords are determined(3604), sentiment is assigned for or more themes associated with thekeywords based at least in part on the review score. A variety oftechniques can be used to assign sentiment. One example is discussed inconjunction with FIG. 38.

FIG. 38 illustrates an example of sentiment being assigned to themesbased on three reviews. In particular, the ontology shown in FIG. 37B isused to identify keywords in the reviews (i.e., the processing of 3604).Those terms appearing in the ontology have been underlined in FIG. 38.Attached to each underlined term (with dotted lines) is a pair of termsand values. Using term 3802 as an example, the term, “SpiffBurger” waslocated in review A. Review A is a 3 star review. For Review A, theterm, “SpiffBurger,” is assigned 3 stars, as is the “FOOD” category towhich it belongs. The term, “pickles,” is also assigned 3 stars, as isthe “FOOD” category to which “pickles” belongs. Thus, each term includedin the review that is also in the ontology shown in FIG. 38 is assigneda value that corresponds to the overall review rating provided by theauthor of the review (i.e., “3 stars,” or “neutral”). Further, anyparents/grandparents in the hierarchy (i.e., “FOOD”) of those terms arealso assigned the overall review rating (i.e., for Review A, “FOOD”receives a value of “3 stars” or “neutral”).

Review B is a 2 star review. In Review B, in addition to termsassociated with FOOD, terms associated with ENVIRONMENT are present.Each of the underlined terms is assigned a value that corresponds to theoverall review rating provided by the author of the review (i.e., “2stars” or “negative”). Further, “FOOD” and “ENVIRONMENT” are alsoassigned a score of 2.

Review C is a 5 star review. In review C, in addition to termsassociated with FOOD, terms associated with VALUE and SERVICE arepresent. Each of the underlined terms, and those categories to which theterms belong, are assigned a value of 5. Note that the reviewed“SpiffBurger” was not to the reviewer's liking. However, it (and FOOD)received a score of “5 stars” (or “positive”) because the overall reviewwas a 5.

As mentioned above, a variety of techniques can be used to assignsentiment to themes (3606). As one example, the point value assigned toeach term (e.g., “SpiffBurger”) and to any parents of a term (e.g.,“FOOD”) could be summed and then subjected to additional processing suchas normalization and/or the application of thresholds. Using the exampleof FIG. 38, suppose each mention is assigned the rating score of thereview in which it appears, and then an average across all mentions istaken. The theme, “SpiffBurger,” would have a (positive) sentiment scoreof 4: (3 points awarded from the first review, 5 points awarded from thethird review, and an average of 8/2=4). The term, “apple pie” would havea (negative) sentiment score of 2: (2 points awarded from the secondreview (a single review)). The term, “pickles,” would have a (neutral)score of 3: (3 points awarded from the first review (a single review)).Since the terms “apple pie” and “pickles” only appear in single review,respectively, in some embodiments those terms are excluded from beingconsidered “themes,” because an insufficient number of reviewers haveseen fit to comment on them.

The score for the concept, FOOD, can also be determined in a variety ofways. As one example, because two distinct food items are mentioned inthe first review, the value for FOOD could be counted twice (i.e., 3+3(for review A)+2+2 (for review B)+5 (for review C)/5 mentions). Asanother example, multiple mentions within a single review of a term (orits parent categories, by extension) could be collapsed into a singleinstance. In this scenario, FOOD would receive a total raw score of(3+2+5)/3. FIG. 39 illustrates an example of a process for assigning asentiment to a theme. In particular, process 3900 can be used to assigna sentiment to the theme, FOOD, based on the presence of keywords suchas “SpiffBurger” and “salty” across multiple reviews.

Returning to process 3600, after the scores have been computed, thosethemes with the highest scores are the most “positive” themes, and thosewith the lowest scores are the most “negative” themes. Additionalapproaches to assigning sentiment are described below.

Smoothing of Positivity

A variety of alternate and/or more sophisticated scoring approaches canalso be used to assign sentiment to themes at 3606. As one example,every keyword extracted from a set of reviews (e.g., per 3604) can begiven a “Positivity score” based on the number of “Pos”itive (4 or 5stars), “Neut”ral (3 stars), and “Neg”ative (1 or 2 stars) reviews asfollows:

Positivity=(5+Pos+0.5*Neut)/(10+Pos+Neut+Neg).

This counts each Pos review as 1 positive vote and each Neut review as ½of a positive vote. A presumption exists that each item begins with 5positive votes and 5 negative votes. That way, items with a highpercentage of positive or negative reviews will not return extremevalues of positivity if the number of reviews is small. A table ofexample positivity calculations is shown in FIG. 40.

FIGS. 41A-41C are portions of tables of themes and scores for an examplerestaurant. The first column in each table lists keyword/parentcategorizations (e.g., obtained at 3604 for all reviews of therestaurant). The second column of each table lists the number ofpositive reviews in which the term (or its child) appears. The thirdcolumn of each table lists the number of neutral reviews in which theterm (or its child) appears. The fourth column of each table lists thenumber of negative reviews in which the term (or its child) appears. Thefifth column of each table lists the total number of reviews in whichthe term (or its child) appears. The final column is a positivitycalculation for the term, (e.g., in accordance with the formula givenabove or other appropriate techniques).

FIG. 41A lists the most common themes across all reviews of therestaurant, irrespective of sentiment. The table is sorted on columnfive. Terms related to “FOOD” (4102) were the most prevalent (present ina total of 628 reviews: 212 positive, 134 neutral, and 282 negative).“FOOD” has a positivity score of 0.45.

FIG. 41B lists the most prevalent negative themes in reviews, as sortedby positivity score. The most notorious aspect of the restaurant is its“management,” (4104) which appears in a single positive review, threeneutral reviews, and thirty-eight negative reviews. The next mostnotorious aspect of the restaurant is the rudeness of its employees(4106).

FIG. 41C lists the most prevalent positive themes in reviews, as sortedby positivity score. Reviewers like the restaurant's “Tuesday” offeringsthe most (4108), followed by the beers the restaurant has on tap (4110).

In some embodiments, additional processing is performed prior to usinginformation such as is shown in FIGS. 41A-41C as input tointerfaces/reports such as are shown in FIG. 30. As one example, anadministrator reviewing the table shown in FIG. 41C may decide some ofthe terms, such as “yum” (4112) and “yummy” (4114) should be collapsedinto a single term (e.g., “yum”) or merged with an existing term (e.g.,“tasty”). The administrator might also decide that certain terms aren'tprobative (i.e., are vacuous terms) and should be removed (e.g. “yum”and “yummy” should be ignored). Additional examples of vacuous termsinclude terms such as “experience,” “day,” and “time.” Suchmodifications can be accomplished in a variety of ways. For example, theadministrator can edit the ontology to map “yum” and “yummy” to tasty.The administrator can also create or edit an existing blacklist toinclude those terms, so that they are not used as themes in the future.In some embodiments, system 102 makes available an interface that allowsan end user, such as Alice, to manipulate which terms are included(e.g., in an ontology) or excluded (e.g., in a blacklist) withoutneeding administrator privileges.

Natural Language Processing

In some embodiments, theme engine 434 is configured to use NLP, such asto identify themes and to perform review deduplication. As one example,theme engine 434 can be configured to use the GATE modules ANNIE andOpenNLP, in conjunction with performing additional NLP processing.

FIG. 42 illustrates an example of a sentence included in a review. Thesentence, “The toiletries are the best thing at Smurfson Hotels” isprocessed by three NLP engines. The processing performed by ANNIE isshown in region 4202. Each line represents a “token,” a unit of meaningwhich is a word or a phrase that has a single meaning. “Surface” is theword exactly as it appears in the review. “Lemma” is the dictionary formof the word (e.g., the single form of a noun or infinitive of a verb).“POS” is the Part of Speech, from a set of tags in the Penn Treebank TagSet. “Entity” is the Named Entity type, which is given only to propernouns. These types are: Person, Location, Organization, Date, JobTitle,or Unknown. Instead of or in addition to using existing keywordontologies, in some embodiments theme engine 434 is configured to useNLP techniques to identify keywords. For example, the output of ANNIEcan be used to generate a list of keywords, e.g., based on parts ofspeech, and used by theme engine 434 in conjunction with process 3600 or3900.

The processing performed by OpenNLP is shown in region 4204. The “S”line represents a clause, which is a larger unit of structure that hasat least a subject and a predicate, a thing doing something. Theremaining lines are phrases, which serve distinct roles in the clause.These are shown preceded by tags which are also from the Penn TreebankTag Set. The indentation shows the hierarchical structure by which aphrase is a component of another phrase.

Finally, additional processing performed by theme engine 434 is shown inregion 4206. The analysis performed in region 4206 turns the OpenNLPanalysis into “Subject Verb Object” structure. In the example shown, the“Agent” is similar to the subject of a clause, the “Predicate” issimilar to the verb, and the “Patient” is similar to the direct object.Additional examples of processing performed on two additional sentencesis shown in FIGS. 43 and 44.

Deduplication

In some embodiments, theme engine 434 is configured to performdeduplication on reviews (e.g., prior to determining sentiments forthemes). Deduplication can be performed to minimize the ability ofreviewers to spam system 102 with duplicate reviews/reviews that reusephrases. A business might seek to bolster its reputation by creatingseveral artificial positive reviews for itself. A business might alsoseek to discredit a competitor by creating several artificial negativereviews for the competitor. Duplicate reviews may be wholesale copies ofone another, or may have slight alterations, e.g. a differentintroduction or conclusion, but with common sentences/clauses.

In some embodiments, deduplication is performed as follows. Anidentifier is assigned to each specific sentence and clause. One way todo this is to use a low-level Java operator that hashes each string suchthat any two arbitrary strings will not have the same resulting hashes.Each item extracted from a review is assigned a hash for the sentencefrom which it was derived, and, if a clause structure is successfullyidentified, another hash is generated for the clause.

Extractions from the sample sentences depicted in FIGS. 42-44 are shownin FIG. 45. In various embodiments, when processes such as process 3600and 3900 are performed, and/or when the data feeding reports such as areshown in interface 3000 is collected, review deduplication is performed.In particular, items are counted on the basis of the number ofoccurrences that are unique in all fields. Therefore, six extractionsfor NOM-Smurfson Hotels-neut with different hash codes count as six suchitems. If either hash code is the same for the six extractions, theywill only be counted as a single item, preventing duplicate text frombeing counted multiple times.

Although the foregoing embodiments have been described in some detailfor purposes of clarity of understanding, the invention is not limitedto the details provided. There are many alternative ways of implementingthe invention. The disclosed embodiments are illustrative and notrestrictive.

What is claimed is:
 1. (canceled)
 2. A system, comprising: a processorconfigured to: receive reputation data extracted from at least one datasource, wherein the reputation data includes a plurality ofuser-authored reviews; determine that a first item is present in a firstreview in the plurality of user-authored reviews; assign to the firstitem an identifier that is generated based at least in part on a stringfrom which the first item was derived; in response to determining thatan identifier assigned to a second item is the same as the identifierassigned to the first item, count the first item and the second item asa single item; determine that a third item, different from butassociated with the first item, is present in a second review in theplurality of user-authored reviews; determine a sentiment for a themebased on the presence of the first and third items; and provide asoutput a report that indicates the sentiment for the theme; and a memorycoupled to the processor and configured to provide the processor withinstructions.
 3. The system of claim 2, wherein the identifier for thefirst item is generated based at least in part on a sentence from whichthe first item was derived.
 4. The system of claim 2, wherein theidentifier for the first item is generated based at least in part on aclause.
 5. The system of claim 2, wherein the identifier for the firstitem is generated at least in part by performing a hash of a string. 6.The system of claim 5, wherein determining that the identifier assignedto the second item is the same as the identifier assigned to the firstitem comprises comparing one or more hash codes.
 7. The system of claim2, wherein determining that the first item is present includesperforming natural language processing.
 8. The system of claim 2,wherein the first and second reviews are reviews of an entity.
 9. Thesystem of claim 8, wherein the processor is further configured todetermine whether the first item is present in an ontology associatedwith the entity.
 10. The system of claim 2, wherein the processor isfurther configured to select an ontology associated with an entity beingreviewed in the first review.
 11. The system of claim 2, wherein theprocessor is further configured to select, from a plurality ofontologies, an ontology that is associated with an industry associatedwith an entity being reviewed in the first review.
 12. A method,comprising: receiving reputation data extracted from at least one datasource, wherein the reputation data includes a plurality ofuser-authored reviews; determining that a first item is present in afirst review in the plurality of user-authored reviews; assigning to thefirst item an identifier that is generated based at least in part on astring from which the first item was derived; in response to determiningthat an identifier assigned to a second item is the same as theidentifier assigned to the first item, counting the first item and thesecond item as a single item; determining that a third item, differentfrom but associated with the first item, is present in a second reviewin the plurality of user-authored reviews; determining a sentiment for atheme based on the presence of the first and third items; and providingas output a report that indicates the sentiment for the theme.
 13. Themethod of claim 12, wherein the identifier for the first item isgenerated based at least in part on a sentence from which the first itemwas derived.
 14. The method of claim 12, wherein the identifier for thefirst item is generated based at least in part on a clause.
 15. Themethod of claim 12, wherein the identifier for the first item isgenerated at least in part by performing a hash of a string.
 16. Themethod of claim 15, wherein determining that the identifier assigned tothe second item is the same as the identifier assigned to the first itemcomprises comparing one or more hash codes.
 17. The method of claim 12,wherein determining that the first item is present includes performingnatural language processing.
 18. The method of claim 12, wherein thefirst and second reviews are reviews of an entity.
 19. The method ofclaim 18, further comprising determining whether the first item ispresent in an ontology associated with the entity.
 20. The method ofclaim 12, further comprising selecting an ontology associated with anentity being reviewed in the first review.
 21. The method of claim 12,further comprising selecting, from a plurality of ontologies, anontology that is associated with an industry associated with an entitybeing reviewed in the first review.
 22. A computer program productembodied in a non-transitory computer readable storage medium andcomprising computer instructions for: receiving reputation dataextracted from at least one data source, wherein the reputation dataincludes a plurality of user-authored reviews; determining that a firstitem is present in a first review in the plurality of user-authoredreviews; assigning to the first item an identifier that is generatedbased at least in part on a string from which the first item wasderived; in response to determining that an identifier assigned to asecond item is the same as the identifier assigned to the first item,counting the first item and the second item as a single item;determining that a third item, different from but associated with thefirst item, is present in a second review in the plurality ofuser-authored reviews; determining a sentiment for a theme based on thepresence of the first and third items; and providing as output a reportthat indicates the sentiment for the theme.